46 research outputs found

    Winter is here: summarizing Twitter streams related to pre-scheduled events

    Get PDF
    Pre-scheduled events, such as TV shows and sports games, usually garner considerable attention from the public. Twitter captures large volumes of discussions and messages related to these events, in real-time. Twitter streams related to pre-scheduled events are characterized by the following: (1) spikes in the volume of published tweets reflect the highlights of the event and (2) some of the published tweets make reference to the characters involved in the event, in the context in which they are currently portrayed in a subevent. In this paper, we take advantage of these characteristics to identify the highlights of pre-scheduled events from tweet streams and we demonstrate a method to summarize these highlights. We evaluate our algorithm on tweets collected around 2 episodes of a popular TV show, Game of Thrones, Season 7.Published versio

    Knowledge Based Template Machine Translation In Low-Resource Setting

    Full text link
    Incorporating tagging into neural machine translation (NMT) systems has shown promising results in helping translate rare words such as named entities (NE). However, translating NE in low-resource setting remains a challenge. In this work, we investigate the effect of using tags and NE hypernyms from knowledge graphs (KGs) in parallel corpus in different levels of resource conditions. We find the tag-and-copy mechanism (tag the NEs in the source sentence and copy them to the target sentence) improves translation in high-resource settings only. Introducing copying also results in polarizing effects in translating different parts-of-speech (POS). Interestingly, we find that copy accuracy for hypernyms is consistently higher than that of entities. As a way of avoiding "hard" copying and utilizing hypernym in bootstrapping rare entities, we introduced a "soft" tagging mechanism and found consistent improvement in high and low-resource settings

    Deriving Verb Predicates By Clustering Verbs with Arguments

    Full text link
    Hand-built verb clusters such as the widely used Levin classes (Levin, 1993) have proved useful, but have limited coverage. Verb classes automatically induced from corpus data such as those from VerbKB (Wijaya, 2016), on the other hand, can give clusters with much larger coverage, and can be adapted to specific corpora such as Twitter. We present a method for clustering the outputs of VerbKB: verbs with their multiple argument types, e.g. "marry(person, person)", "feel(person, emotion)." We make use of a novel low-dimensional embedding of verbs and their arguments to produce high quality clusters in which the same verb can be in different clusters depending on its argument type. The resulting verb clusters do a better job than hand-built clusters of predicting sarcasm, sentiment, and locus of control in tweets

    Detecting frames in news headlines and its application to analyzing news framing trends surrounding U.S. gun violence

    Full text link
    Different news articles about the same topic often offer a variety of perspectives: an article written about gun violence might emphasize gun control, while another might promote 2nd Amendment rights, and yet a third might focus on mental health issues. In communication research, these different perspectives are known as “frames”, which, when used in news media will influence the opinion of their readers in multiple ways. In this paper, we present a method for effectively detecting frames in news headlines. Our training and performance evaluation is based on a new dataset of news headlines related to the issue of gun violence in the United States. This Gun Violence Frame Corpus (GVFC) was curated and annotated by journalism and communication experts. Our proposed approach sets a new state-of-the-art performance for multiclass news frame detection, significantly outperforming a recent baseline by 35.9% absolute difference in accuracy. We apply our frame detection approach in a large scale study of 88k news headlines about the coverage of gun violence in the U.S. between 2016 and 2018.Published versio

    Better quality estimation for low resource corpus mining

    Get PDF
    000000000000000000000000000000000000000000000000000000010241 - University of California, Berkeleyhttps://aclanthology.org/2022.findings-acl.45/First author draf

    Implementasi Metode Interpolasi Bicubic Modifikasi pada Proses Downsampling Citra

    Get PDF
    Abstrak Proses downsampling digunakan untuk memperkecil resolusi dari sebuah citra dengan menghilangkan sebagian piksel, hal itu berpengaruh pada kualitas citra yang dihasilkan. Untuk menjaga kualitas citra, maka metode interpolasi Bicubic modifikasi digunakan dalam proses downsampling. Interpolasi bicubic modifikasi merupakan modifikasi dari interpolasi bicubic dengan kualitas yang sama dan waktu pemrosesan yang lebih cepat. Maka pada penelitian ini akan dibahas mengenai implementasi metode interpolasi bicubic modifikasi pada proses downsampling citra yang dilakukan dengan menghitung 16 nilai piksel terdekat yang terdapat pada citra input dan dihasilkan nilai piksel baru. Pengujian dilakukan pada 2 (dua) jenis citra yaitu RGB dan grayscale. Masing-masing jenis dibagi menjadi 2 (dua) format citra yaitu JPG dan BMP. Hasil pengujian menunjukkan bahwa ukuran file citra output paling stabil pada citra JPG RGB dengan range 300 – 400 KB dan citra JPG grayscale dengan range 0 – 50 KB. Waktu pemrosesan tercepat pada citra JPG RGB dengan range 200 – 300 KB, sedangkan untuk citra grayscale skala 25% pada citra JPG dengan range 150 – 200 KB, skala 50% pada citra JPG dengan range 0 – 500 KB dan skala 75% pada citra BMP. PSNR tertinggi dihasilkan oleh citra JPG RGB dengan range 100 – 200 KB dan citra JPG grayscale dengan range 0 – 50 K

    Implementasi Fitur Gradien untuk Penghitungan Kendaraan Bermotor pada Video Lalu Lintas

    Get PDF
    Sistem penghitungan jumlah kendaraan bermotor pada citra digital adalah sebuah sistem yang dapat mengenali dan menghitung jumlah kendaraan bermotor dengan cara mengolah citra digital yang diberikan. Data tersebut berguna untuk perencanaan dan pengembangan jalan. Untuk mendapatkan data yang akan diolah, digunakan kamera untuk merekam video lalu lintas, kemudian diterapkan teknik pengolahan citra digital pada video tersebut untuk mendapatkan informasi mengenai jumlah kendaraan yang melintas di jalan. Untuk mendeteksi objek yang bergerak pada video, digunakan metode background subtraction. Setelah didapat objek yang bergerak, dilakukan proses inisialisasi region of interest untuk menentukan wilayah yang akan diproses lebih lanjut. Proses berikutnya adalah melakukan binarization, yaitu proses mengubah citra menjadi citra biner. Lalu, dilakukan operasi morfologi untuk mendapatkan mask objek yang terdeteksi. Mask tersebut digunakan untuk proses cropping objek yang terdeteksi. Langkah berikutnya adalah mengubah citra cropping menjadi citra gradien, lalu dilakukan penghitungan rata-rata gradien. Nilai rata-rata gradien digunakan untuk proses klasifikasi objek yang terdeteksi sebagai mobil atau motor. Berdasarkan penelitian ini, hasil penghitungan mobil memperoleh akurasi sebesar 84,85%, precision sebesar 83,33%, dan recall sebesar 76,92%. Sedangkan hasil penghitungan motor memperoleh akurasi sebesar 82,35%, precision sebesar 85,71%, dan recall sebesar 85,71%

    AugCSE: contrastive sentence embedding with diverse augmentations

    Get PDF
    Data augmentation techniques have been proven useful in many applications in NLP fields. Most augmentations are task-specific, and cannot be used as a general-purpose tool. In our work, we present AugCSE, a unified framework to utilize diverse sets of data augmentations to achieve a better, general-purpose, sentence embedding model. Building upon the latest sentence embedding models, our approach uses a simple antagonistic discriminator that differentiates the augmentation types. With the finetuning objective borrowed from domain adaptation, we show that diverse augmentations, which often lead to conflicting contrastive signals, can be tamed to produce a better and more robust sentence representation. Our methods achieve state-of-the-art results on downstream transfer tasks and perform competitively on semantic textual similarity tasks, using only unsupervised data.000000000000000000000000000000000000000000000000000000010241 - University of California, Berkeleyhttps://aclanthology.org/2022.aacl-main.30/First author draf
    corecore